{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "Tce3stUlHN0L"
      },
      "source": [
        "##### Copyright 2020 The TensorFlow Authors."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "cellView": "form",
        "colab": {},
        "colab_type": "code",
        "id": "tuOe1ymfHZPu"
      },
      "outputs": [],
      "source": [
        "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n",
        "# you may not use this file except in compliance with the License.\n",
        "# You may obtain a copy of the License at\n",
        "#\n",
        "# https://www.apache.org/licenses/LICENSE-2.0\n",
        "#\n",
        "# Unless required by applicable law or agreed to in writing, software\n",
        "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
        "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
        "# See the License for the specific language governing permissions and\n",
        "# limitations under the License."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "qFdPvlXBOdUN"
      },
      "source": [
        "# Introduction to Gradients and Automatic Differentiation"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "MfBg1C5NB3X0"
      },
      "source": [
        "<table class=\"tfo-notebook-buttons\" align=\"left\">\n",
        "  <td>\n",
        "    <a target=\"_blank\" href=\"https://www.tensorflow.org/guide/autodiff\"><img src=\"https://www.tensorflow.org/images/tf_logo_32px.png\" />View on TensorFlow.org</a>\n",
        "  </td>\n",
        "  <td>\n",
        "    <a target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/docs/blob/master/site/en/guide/autodiff.ipynb\"><img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" />Run in Google Colab</a>\n",
        "  </td>\n",
        "  <td>\n",
        "    <a target=\"_blank\" href=\"https://github.com/tensorflow/docs/blob/master/site/en/guide/autodiff.ipynb\"><img src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" />View source on GitHub</a>\n",
        "  </td>\n",
        "  <td>\n",
        "    <a href=\"https://storage.googleapis.com/tensorflow_docs/docs/site/en/guide/autodiff.ipynb\"><img src=\"https://www.tensorflow.org/images/download_logo_32px.png\" />Download notebook</a>\n",
        "  </td>\n",
        "</table>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "r6P32iYYV27b"
      },
      "source": [
        "## Automatic Differentiation and Gradients\n",
        "\n",
        "[Automatic differentiation](https://en.wikipedia.org/wiki/Automatic_differentiation)\n",
        "is useful for implementing machine learning algorithms such as\n",
        "[backpropagation](https://en.wikipedia.org/wiki/Backpropagation) for training\n",
        "neural networks. \n",
        "\n",
        "In this guide, we will discuss ways you can compute gradients with TensorFlow, especially in eager execution."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "MUXex9ctTuDB"
      },
      "source": [
        "## Setup"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "IqR2PQG4ZaZ0"
      },
      "outputs": [],
      "source": [
        "import tensorflow as tf"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "xHxb-dlhMIzW"
      },
      "source": [
        "## Computing gradients\n",
        "\n",
        "To differentiate automatically, TensorFlow needs to remember what operations happen in what order during the *forward* pass.  Then, during the *backward pass*, TensorFlow traverses this list of operations in reverse order to compute gradients."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "1CLWJl0QliB0"
      },
      "source": [
        "## Gradient tapes\n",
        "\n",
        "TensorFlow provides the [tf.GradientTape](https://www.tensorflow.org/api_docs/python/tf/GradientTape) API for automatic differentiation; that is, computing the gradient of a computation with respect to its input variables. TensorFlow \"records\" all operations executed inside the context of a `tf.GradientTape` onto a \"tape\". TensorFlow then uses that tape and the gradients associated with each recorded operation to compute the gradients of a \"recorded\" computation using [reverse mode differentiation](https://en.wikipedia.org/wiki/Automatic_differentiation).\n",
        "\n",
        "With scalars:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "LsvrwF6bHroC"
      },
      "outputs": [],
      "source": [
        "x = tf.constant(3.0)\n",
        "# y = x ^ 2\n",
        "with tf.GradientTape() as t:\n",
        "  t.watch(x)\n",
        "  y = x * x\n",
        "# dy = 2x\n",
        "dy_dx = t.gradient(y, x)\n",
        "dy_dx.numpy()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "jen-yWFxIAWQ"
      },
      "source": [
        "Using matrices:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "bAFeIE8EuVIq"
      },
      "outputs": [],
      "source": [
        "x = tf.constant([3.0, 3.0])\n",
        "\n",
        "with tf.GradientTape() as t:\n",
        "  t.watch(x)\n",
        "  z = tf.multiply(x, x)\n",
        "\n",
        "print(z)\n",
        "\n",
        "# Find derivative of z with respect to the original input tensor x\n",
        "print(t.gradient(z, x))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "N4VlqKFzzGaC"
      },
      "source": [
        "All operations on `tf.Variable` are added to the tape.  To record gradients with respect to constant tensor inputs (as above), you need to add `t.watch(x)` to tell the gradient tape to track the input tensor.\n",
        "\n",
        "Note: A common problem is that inputs are not `tf.Tensors`, but some tensor-like structure like a NumPy `ndarray` or a Python list.  This kind of input will be converted to tensors within the first op, but do not start as tensors, which may result in confusing `None` gradients or errors because you have accidentally passed in integers (which cannot form gradients).  To forestall this issue, use `tf.constant` with an appropriate `dtype`, or `tf.data.from_tensor_slices` to convert your input data into tensors or a `tf.data.Dataset`."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "2g1nKB6P-OnA"
      },
      "source": [
        "### Intermediate results\n",
        "\n",
        "You can also request gradients of the output with respect to intermediate values computed during a \"recorded\" `tf.GradientTape` context."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "7XaPRAwUyYms"
      },
      "outputs": [],
      "source": [
        "x = tf.constant([3.0, 3.0])\n",
        "\n",
        "with tf.GradientTape() as t:\n",
        "  t.watch(x)\n",
        "  y = tf.multiply(x, x)\n",
        "  z = tf.multiply(y, y)\n",
        "\n",
        "# Use the tape to compute the derivative of z with respect to the\n",
        "# intermediate value y.\n",
        "# dz_dy = 2 * y, where y = x ^ 2\n",
        "print(t.gradient(z, y))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "RkcpQnLgNxgi"
      },
      "source": [
        "Gradient tapes automatically watch any variable accessed within their scope.  You can list them in order they are created."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "3AhiLRS8N7QY"
      },
      "outputs": [],
      "source": [
        "x = tf.Variable(2.0)\n",
        "y = tf.Variable(3.0)\n",
        "\n",
        "with tf.GradientTape() as t:\n",
        "  # Don't need any calls to watch(), as when variables x and y\n",
        "  # get used, they will be automatically watched\n",
        "  y_sq = y * y\n",
        "  z = x + y_sq\n",
        "\n",
        "t.watched_variables()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "ISkXuY7YzIcS"
      },
      "source": [
        "By default, the resources held by a `GradientTape` are released as soon as `GradientTape.gradient()` method is called. To compute multiple gradients over the same computation, create a `persistent` gradient tape. This allows multiple calls to the `gradient()` method as resources are released when the tape object is garbage collected. For example:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "zZaCm3-9zVCi"
      },
      "outputs": [],
      "source": [
        "x = tf.constant(3.0)\n",
        "with tf.GradientTape(persistent=True) as t:\n",
        "  t.watch(x)\n",
        "  y = x * x\n",
        "  z = y * y\n",
        "print(t.gradient(z, x))  # 108.0 (4*x^3 at x = 3)\n",
        "print(t.gradient(y, x))  # 6.0 (2 * x)\n",
        "del t  # Drop the reference to the tape"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "O_ZY-9BUB7vX"
      },
      "source": [
        "### Notes on performance\n",
        "\n",
        "* There is a tiny overhead associated with doing operations inside a gradient tape context. For most eager execution this will not be a noticeable cost, but you should still use tape context around the areas only where it is required.\n",
        "\n",
        "* Gradient tapes use memory to store intermediate results, including inputs and outputs, for use during the backwards pass.\n",
        "\n",
        "  For efficiency, some ops (like `ReLU`) don't need to keep their intermediate results and they are pruned during the forward pass. However, if you use `persistent=True` on your tape, *nothing is discarded* and your peak memory usage will be higher."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "6kADybtQzYj4"
      },
      "source": [
        "### Recording control flow\n",
        "\n",
        "Because tapes record operations as they are executed, Python control flow (using `if`s and `while`s for example) is naturally handled:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "9FViq92UX7P8"
      },
      "outputs": [],
      "source": [
        "def f(x, y):\n",
        "  output = 1.0\n",
        "  for i in range(y):\n",
        "    if i > 1 and i < 5:\n",
        "      output *= x  # tf.multiply(output, x)\n",
        "  return output\n",
        "\n",
        "def grad(x, y):\n",
        "  with tf.GradientTape() as t:\n",
        "    t.watch(x)\n",
        "    out = f(x, y)\n",
        "  return t.gradient(out, x)\n",
        "\n",
        "x = tf.constant(2.0)\n",
        "\n",
        "print(grad(x, 6).numpy())  # 12.0\n",
        "print(grad(x, 5).numpy())  # 12.0\n",
        "print(grad(x, 4).numpy())  # 4.0\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "egypBxISAHhx"
      },
      "source": [
        "### Getting a gradient of `None`\n",
        "\n",
        "When a gradient is not available for any reason, you will usually get a gradient of `None`.\n",
        "\n",
        "A common situation is trying to get the gradient of a non-float and non-complex value."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "9jlHXHqfASU3"
      },
      "outputs": [],
      "source": [
        "with tf.GradientTape() as tape:\n",
        "  x = tf.Variable([[2, 2], [2, 2]], dtype=tf.int8)\n",
        "  y = x * x\n",
        "print(tape.gradient(y, x))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "4PVxPRC2AS5p"
      },
      "source": [
        "In other situations, you may be doing operations that are not differentiable. `DecodeJpeg`, for example, does not produce a gradient.  To see which operations have gradients registered for them, see the list of [raw ops](https://www.tensorflow.org/api_docs/python/tf/raw_ops).\n",
        "\n",
        "Finally, you may just be deriving gradients where there is no connection between the input and out."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "CU185WDM81Ut"
      },
      "outputs": [],
      "source": [
        "x = tf.Variable(2.)\n",
        "y = tf.Variable(3.)\n",
        "\n",
        "with tf.GradientTape() as tape:\n",
        "  z = y * y\n",
        "print(tape.gradient(x, y))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "TYDrVogA89eA"
      },
      "source": [
        "It may be convenient to return 0 instead of `None`.  You can decided what to return when you have unconnected gradients using the `unconnected_gradient` argument."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "U6zxk1sf9Ixx"
      },
      "outputs": [],
      "source": [
        "x = tf.Variable(2.)\n",
        "y = tf.Variable(3.)\n",
        "\n",
        "with tf.GradientTape() as tape:\n",
        "  z = y * y\n",
        "print(tape.gradient(x, y, unconnected_gradients=tf.UnconnectedGradients.ZERO))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "mbb-9lnGVngH"
      },
      "source": [
        "## Custom gradients\n",
        "\n",
        "In some cases, you may want to control exactly how gradients are calculated rather than using the default.  These situations include:\n",
        "\n",
        "* There is no defined gradient for a new op you are writing.\n",
        "* The default calculations are numerically unstable.\n",
        "* You wish to cache an expensive computation from the forward pass.\n",
        "* You want to modify  a value (for example using: `tf.clip_by_value`, `tf.math.round`) without modifying the gradient.\n",
        "\n",
        "For writing a new op, you can use `tf.RegisterGradient` to set up your own. See that page for details. (Note that the gradient registry is global, so change it with caution.)\n",
        "\n",
        "For the latter two cases, you can use `tf.custom_gradient`.  Here is an example that applies `tf.clip_by_norm` to the gradient.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "Mjj01w4NYtwd"
      },
      "outputs": [],
      "source": [
        "# Establish an identity operation, but clip during the gradient pass\n",
        "@tf.custom_gradient\n",
        "def clip_gradients(y):\n",
        "  def backward(dy):\n",
        "    return tf.clip_by_norm(dy, 0.5)\n",
        "  return y, backward\n",
        "\n",
        "v = tf.Variable(2.0)\n",
        "with tf.GradientTape() as t:\n",
        "  output = clip_gradients(v * v)\n",
        "print(t.gradient(output, v))  # calls \"backward\", which clips 4 to 2\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "n4t7S0scYrD3"
      },
      "source": [
        "See the `tf.custom_gradient` decorator for more details."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "DK05KXrAAld3"
      },
      "source": [
        "### Higher-order gradients\n",
        "\n",
        "Operations inside of the `GradientTape` context manager are recorded for automatic differentiation. If gradients are computed in that context, then the gradient computation is recorded as well. As a result, the exact same API works for higher-order gradients as well. For example:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "cPQgthZ7ugRJ"
      },
      "outputs": [],
      "source": [
        "x = tf.Variable(1.0)  # Create a Tensorflow variable initialized to 1.0\n",
        "\n",
        "with tf.GradientTape() as t:\n",
        "  with tf.GradientTape() as t2:\n",
        "    y = x * x * x\n",
        "\n",
        "  # Compute the gradient inside the 't' context manager\n",
        "  # which means the gradient computation is differentiable as well.\n",
        "  dy_dx = t2.gradient(y, x)\n",
        "d2y_dx2 = t.gradient(dy_dx, x)\n",
        "\n",
        "print(\"dy_dx:\", dy_dx.numpy())  # 3.0\n",
        "print(\"d2y_dx2:\", d2y_dx2.numpy())  # 6.0"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "fc-PocLB-TP3"
      },
      "source": [
        "## Advanced features for `GradientTape`\n",
        "\n",
        "In this section, we examine some less-common uses of `GradientTape`."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "uGRJJRi8TCkJ"
      },
      "source": [
        "### Controlling gradient recording\n",
        "\n",
        "You can choose to only watch individual variables.  To do so, you can turn off the default watching of all variables.  In that case, you will be required to watch all variables manually."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "KT6E-xYUKpGN"
      },
      "outputs": [],
      "source": [
        "x = tf.Variable(2.0)\n",
        "y = tf.Variable(3.0)\n",
        "\n",
        "with tf.GradientTape(watch_accessed_variables=False, persistent=True) as t:\n",
        "  # Only watch y\n",
        "  t.watch(y)\n",
        "  y_sq = y * y\n",
        "  z = x + y_sq\n",
        "\n",
        "# Gradient will be None, as x is not watched, so the tape cannot\n",
        "# differentiate it\n",
        "print(\"Gradient with respect to x:\", t.gradient(z, x))\n",
        "# Gradient will be 6, as y is watched\n",
        "print(\"Gradient with respect to y:\", t.gradient(z, y))\n",
        "del t"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "gB_i0VnhQKt2"
      },
      "source": [
        "If you wish to stop recording gradients, you can use `stop_recording()` to temporarily suspend recording. \n",
        "\n",
        "If you wish to start over entirely, use `reset()`.  Simply exiting the gradient tape block and restarting is usually easier to read, but you can use `reset` when exiting the tape block is difficult or imnpossible."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "mhFSYf7uQWxR"
      },
      "outputs": [],
      "source": [
        "x = tf.Variable(2.0)\n",
        "y = tf.Variable(3.0)\n",
        "\n",
        "with tf.GradientTape(persistent=True) as t:\n",
        "  y_sq = y * y\n",
        "  # Stop recording so we can calculate an intermediate gradient\n",
        "  with t.stop_recording():\n",
        "    print(\"y with respect to y_sq:\", t.gradient(y_sq, y))  # Will be 1\n",
        "  z = x + y_sq\n",
        "  t.reset()\n",
        "\n",
        "print(\"z with respect to y after reset:\", t.gradient(z, y))  # None"
      ]
    }
  ],
  "metadata": {
    "colab": {
      "collapsed_sections": [
        "Tce3stUlHN0L"
      ],
      "name": "autodiff.ipynb",
      "private_outputs": true,
      "provenance": [],
      "toc_visible": true
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}
